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Abstract 

Fairness  is  a  mathematical  abstraction  used  in  the  modeling  of  a  wide  range  of 
phenomena,  including  concurrency,  scheduling,  and  probability.  In  this  paper,  we 
study  fairness  in  the  context  of  probabilistic  systems,  and  we  introduce  probabilistic 
fairness,  a  novel  notion  of  fairness  that  is  itself  defined  in  terms  of  probability.  The 
definition  of  probabilistic  fairness  makes  it  invariant  with  respect  to  synchronous 
composition,  and  facilitates  the  design  of  model-checking  algorithms  for  quantitative 
properties  of  probabilistic  systems.  We  compare  probabilistic  fairness  with  other 
notions  of  fairness  for  probabilistic  systems,  and  we  provide  algorithms  that  solve 
the  verification  problem  for  various  classes  of  probabilistic  properties  on  finite-state 
systems  with  fairness. 


1  Introduction 

The  use  of  formal  methods  for  the  analysis  and  verification  of  systems  re¬ 
quires  a  mathematical  model  of  the  system  being  studied.  Many  system  mod¬ 
els  include  nondeterminism,  which  enables  the  representation  of  interleaving 
concurrency,  and  the  modeling  of  schedulers  and  of  partially  unknown  or  un¬ 
specified  components.  Fairness  is  a  constraint  on  the  resolution  of  the  non- 
deterministic  choices,  and  it  has  been  introduced  to  represent  a  multiplicity 
of  related  phenomena,  such  as  the  progress  of  threads  of  computation,  gen¬ 
eral  environments,  the  behavior  of  probabilistic  choice,  and  the  impartiality 
of  arbiters  and  schedulers.  Several  notions  of  fairness  have  been  presented, 
each  tailored  to  the  modeling  of  some  class  of  phenomena;  [20,15,19]  present 
general  overviews. 
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In  the  context  of  non-probabilistic  systems,  a  notion  of  fairness  is  usually 
defined  by  specifying  the  set  p  of  system  paths  that  are  considered  fair,  where 
a  “path”  is  defined  as  an  infinite  sequence  of  states,  or  as  an  infinite  sequence 
of  alternated  states  and  transitions.  The  semantics  of  the  system  is  defined  in 
terms  of  the  subset  p  of  fair  paths  only:  the  paths  outside  p  are  not  interpreted 
as  possible  system  behaviors.  For  example,  consider  a  system  in  which  at  a 
state  s  the  choice  between  two  alternatives  a  and  b  is  possible,  and  assume 
that  this  choice  is  required  to  be  fair.  The  two  alternative  might  represent 
the  choice  of  servicing  the  requests  coming  from  either  one  of  two  processes. 
According  to  the  notion  of  strong  fairness,  the  set  p  of  fair  paths  consists  of 
all  the  paths  that  choose  both  a  and  b  infinitely  often,  whenever  s  is  visited 
infinitely  often.  In  the  example,  strong  fairness  enables  the  study  of  the  system 
under  the  assumption  that  the  scheduling  algorithm  does  not  eventually  cease 
to  schedule  the  requests  originating  from  one  of  the  two  processes.  Other 
notions  of  fairness,  such  as  weak  fairness  and  a-fairness,  are  specified  by 
providing  different  definitions  for  the  set  p  of  fair  paths  [21,24], 

In  this  paper,  we  study  systems  in  which  both  probabilistic  and  nondeter- 
ministic  behavior  coexist;  these  systems  will  be  called  for  brevity  probabilistic 
systems.  As  in  other  types  of  systems,  fairness  in  probabilistic  systems  is 
also  a  constraint  on  the  resolution  of  the  nondeterministic  choices.  However, 
fairness  in  probabilistic  systems  is  defined  differently  than  in  purely  nondeter¬ 
ministic  systems,  since  the  apparatus  required  to  deal  with  both  probabilistic 
and  nondeterministic  choice  is  more  complex  than  the  one  required  for  non¬ 
determinism  alone. 

Consider  a  system  where  nondeterministic  choice  coexists  with  probabilis¬ 
tic  one,  and  assume  that  at  a  given  state  s  the  nondeterministic  choice  between 
two  alternatives  a  and  b  is  possible.  Following  [16,31],  we  model  the  resolution 
of  the  nondeterministic  choice  by  a  scheduler  —  that  we  call  policy  —  which 
at  s  selects  one  of  a,  b.  Unlike  [22,16,31],  however,  we  consider  randomized 
policies  rather  than  deterministic  ones,  following  the  customary  approach  in 
the  theory  of  Markov  decision  processes  [14],  as  well  as  the  approach  of  [29,28]. 
Each  time  the  system  is  at  s,  the  (randomized)  policy  dictates  the  probabil¬ 
ities  of  choosing  a  and  b,  possibly  as  a  function  of  the  system’s  past.  Since 
nondeterminism  is  resolved  by  the  policies,  in  probabilistic  systems  fairness 
is  usually  expressed  by  specifying  a  set  T  of  fair  policies.  Again,  during  the 
analysis  of  system  properties,  only  fair  policies  are  considered. 

The  notions  of  fairness  that  have  been  proposed  so  far  for  probabilistic 
systems  are  the  direct  counterparts  of  notions  proposed  for  purely  nondeter¬ 
ministic  systems  [16,31,17].  Given  a  notion  of  fairness  for  nondeterministic 
systems  specified  as  a  set  p  of  fair  paths,  the  corresponding  notion  for  prob¬ 
abilistic  systems  is  obtained  by  defining  a  policy  to  be  fair  iff  all  the  paths 
arising  from  the  policy  (except  perhaps  for  a  set  of  measure  0)  belong  to  p. 
Hence,  each  notion  of  fairness  p  defined  as  a  set  of  paths  gives  rise  to  a  cor¬ 
responding  notion  $(</?)  defined  on  policies.  Consider  again  our  system  where 
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the  alternatives  a  and  b  must  be  fairly  chosen  at  a  state  s.  According  to  the 
notion  of  fairness  that  corresponds  to  strong  fairness,  a  policy  is  fair  iff  all  the 
paths  that  arise  from  it  (except  perhaps  for  a  set  of  measure  0)  are  such  that, 
if  s  is  visited  infinitely,  both  a  and  b  are  chosen  infinitely  often.  This  is  one 
of  the  notions  of  fairness  described  in  [31,17]. 

In  this  paper  we  introduce  a  novel  notion  of  fairness,  called  probabilistic 
fairness.  Unlike  previous  notions  of  fairness,  probabilistic  fairness  is  a  local 
notion  of  fairness:  it  is  expressed  directly  in  terms  of  the  behavior  of  the 
policies  at  the  various  states,  and  it  has  no  counterpart  as  a  requirement  on 
paths.  According  to  probabilistic  fairness,  a  policy  is  fair  iff  there  is  an  e  >  0 
such  that  all  fair  alternatives  are  chosen  with  probability  at  least  e  by  the 
policy.  In  our  previous  example,  a  policy  is  fair  iff  the  probability  with  which 
the  alternatives  a  and  b  are  chosen  at  s  is  bounded  below  by  e  >  0.  We  note 
that,  while  £  can  vary  from  one  policy  to  the  other,  it  must  be  constant  for  each 
policy,  rather  than  dependent  on  the  state  of  the  system  or  on  its  past  history. 
Probabilistic  fairness  entails  several  benefits  over  previous  notions  of  fairness 
for  probabilistic  systems.  These  benefits  are  both  semantical,  concerning  the 
modeling  of  systems,  and  algorithmic,  concerning  the  algorithms  for  system 
verification. 


1.1  Semantical  benefits 

Probabilistic  fairness  offers  three  semantical  benefits:  it  provides  a  simple 
way  of  representing  probabilistic  choice  while  abstracting  from  the  numerical 
values  of  probability;  it  exhibits  a  simple  form  of  invariance  with  respect  to 
synchronous  composition;  and  it  enables  the  representation  of  threads  of  com¬ 
putation  in  which  the  ratios  between  the  speeds  of  computation  is  unknown, 
but  bounded. 

Representation  of  probabilistic  choice 

Representing  the  qualitative  properties  of  probabilistic  choice,  while  abstract¬ 
ing  from  the  values  of  the  transition  probabilities,  has  two  purposes.  First,  it 
enables  the  modeling  of  probabilistic  behavior  in  the  cases  in  which  the  prob¬ 
abilities  of  some  alternatives  are  not  known,  except  for  the  fact  that  they  are 
positive.  This  can  be  useful  whenever  the  probabilities  have  not  been  measured 
accurately,  or  when  the  portion  of  the  system  giving  rise  to  the  probabilistic 
behavior  has  not  been  designed  yet.  Second,  probability  provides  a  reference 
model  for  schedulers  that  are  completely  impartial  with  respect  to  the  in¬ 
coming  requests.  Indeed,  several  fairness  notions  that  have  been  introduced 
to  model  schedulers,  such  as  strong  fairness,  event  and  process  fairness,  and 
interaction  fairness,  exclude  the  set  of  paths  that  have  0  probability  under 
the  purely  probabilistic  scheduling  of  the  steps,  events,  or  process  interactions 
that  occur  along  the  paths  [15,19]. 

The  problem  of  finding  a  notion  of  fairness  that  corresponds  to  the  quali- 
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tative  properties  of  probabilistic  choice  was  considered  already  in  [22],  With 
respect  to  the  verification  of  linear-time  temporal  logic  properties  (and  more 
generally,  membership  in  o;-regular  languages),  the  problem  was  settled  with 
the  introduction  of  a-f airness,  a  fairly  complex  notion  of  fairness  [24],  Prob¬ 
abilistic  fairness  offers  a  straightforward  solution  to  this  problem,  since  it  is 
defined  directly  in  terms  of  probabilities.  While  the  adoption  of  probabilistic 
fairness  seems  to  contradict  the  goal  of  eliminating  probability  from  the  sys¬ 
tem  model,  we  will  show  that  the  model-checking  algorithms  for  probabilistic 
fairness  do  not  incur  any  additional  complexity  due  to  its  probability-based 
definition. 

Synchronous  composition 

Synchronous  composition  is  a  basic  step  in  the  modeling  and  verification  of  sys¬ 
tems:  it  can  be  used  to  construct  the  complete  system  from  smaller  component 
systems,  and  the  synchronous  composition  of  the  system  with  an  automaton 
derived  from  the  specification  is  at  the  heart  of  several  verification  algorithms 
[31,23,5,6].  Probabilistic  fairness  exhibits  a  simple  invariance  property  with 
respect  to  synchronous  composition. 

If  two  systems  V  and  Q  are  non-interacting,  and  if  a  policy  ir-p  for  V  is 
probabilistically  fair,  then  the  policy  tit\\q  obtained  by  projecting  7 r-p  onto 
the  synchronous  composition  V\\Q  of  V  and  Q  is  also  probabilistically  fair. 

This  invariance  property  states  that  the  fairness  of  a  policy  for  a  given  system 
does  not  depend  on  whether  the  system  is  considered  in  isolation,  or  together 
with  other  non-interacting  systems.  While  some  notions  of  fairness  satisfy 
the  above  invariance  (notably  a- fairness) ,  this  is  not  the  case  for  some  of  the 
most  common  notions,  such  as  weak  and  strong  fairness  [21],  The  fact  that 
probabilistic  fairness  satisfies  this  invariance  property  is  a  direct  consequence 
of  the  local  nature  of  its  definition. 

Progress  of  independent  threads  of  computation 

Probabilistic  fairness  enables  the  modeling  of  the  progress  of  independent 
threads  of  computation,  in  which  the  ratio  between  the  speeds  of  computation 
is  unknown,  but  bounded.  In  the  context  of  timed  probabilistic  systems, 
probabilistic  fairness  also  enables  the  modeling  of  transitions  having  finite, 
but  unknown,  average  delay,  as  discussed  in  detail  in  [11],  In  these  respects, 
probabilistic  fairness  is  related  to  fi, nitary  fairness,  a  (non-probabilistic)  notion 
of  fairness  proposed  for  reasoning  about  distributed  algorithms  [1]. 

1.2  Algorithmic  benefHs 

The  solution  of  many  verification  problems  for  probabilistic  systems  consists 
in  determining  a  policy  that  is  optimal  (or  pessimal)  with  respect  to  the  prop¬ 
erty  of  interest,  and  in  checking  whether  the  property  holds  for  this  optimal  or 
pessimal  policy.  When  fairness  is  introduced  in  the  system  model,  the  optimal 
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(or  pessimal)  policy  must  be  chosen  from  the  set  of  fair  policies,  rather  than 
from  the  set  of  all  policies.  However,  the  optimization  methods  available  from 
the  theory  of  Markov  decision  processes  compute  the  optimal  and  pessimal 
policies  in  the  set  of  all  policies,  and  they  cannot  be  easily  adapted  to  conduct 
the  optimization  in  the  smaller  set  of  fair  policies  [14,3].  To  show  that  the 
(unconstrained)  solution  of  an  optimization  problem  can  be  used  in  the  veri¬ 
fication  of  fair  probabilistic  systems,  we  have  to  show  that  the  optimal  values 
of  the  quantities  of  interest  can  be  realized  or  at  least  approximated  by  a  set 
of  fair  policies,  following  the  idea  of  [22,17]. 

The  local  definition  of  probabilistic  fairness  facilitates  the  construction  of 
such  approximating  policies,  by  ensuring  that  the  convex  combination  of  a 
generic  policy  and  a  fair  policy  is  a  fair  policy.  To  illustrate  this  point,  assume 
that  the  policies  are  memoryless,  i.e.  that  the  probabilities  with  which  the 
alternatives  are  chosen  depend  only  on  the  current  system  state,  and  denote 
by  7 r(s)(a)  the  probability  with  which  alternative  a  is  selected  at  state  s. 
Given  a  generic  policy  7 Tg  and  a  fair  policy  7Ty,  their  convex  combination  tt[x\ 
for  0  <  x  <  1  is  defined  by 

7r[x](s)(o)  =  (1  —  x)  7 Tg(s)(a)  +  XI Tf(s)(a) 

for  all  states  s  and  all  alternatives  a.  For  0  <  x  <  1,  policy  tt[x\  is  fair, 
and  for  x  =  0  it  coincides  with  ng.  Consider  a  function  h  from  policies  to 
real  numbers;  the  value  h( tt)  can  represent  for  example  a  performance  index 
of  the  system  under  policy  tt.  To  show  that  the  value  of  the  performance 
index  corresponding  to  7 Tg  can  be  approximated  by  fair  policies,  it  suffices  to 
prove  that  linx^o  h(rr[x\)  =  h(7r[0])  =  h(7Tg).  Often,  this  proof  can  be  carried 
out  using  standard  methods  from  calculus  and  linear  algebra.  With  minor 
variations,  this  approach  to  the  construction  of  approximating  policies  will  be 
used  to  justify  all  the  verification  algorithms  presented  in  the  paper. 


1.3  Paper  outline 

After  providing  a  standard  definition  for  probabilistic  systems,  we  introduce 
three  notions  of  fairness.  The  first  one  is  probabilistic  fairness;  the  second  one 
is  unbounded  fairness,  a  weaker  variant  of  probabilistic  fairness  that  shares 
some  of  its  properties,  and  the  third  one  is  path  fairness,  which  is  essentially 
the  notion  studied  in  [31,17].  We  show  that  probabilistic  and  unbounded 
fairness,  unlike  path  fairness,  are  invariant  with  respect  to  synchronous  com¬ 
position.  We  then  compare  the  three  notions  of  fairness  with  respect  to  three 
classes  of  properties: 

Maximum  acceptance  probability.  This  class  of  properties  concerns  the 
maximum  probability  with  which  a  path  satisfies  the  Rabin  acceptance  con¬ 
dition  of  an  ^-automaton,  and  it  is  related  to  the  maximum  probability  of 
satisfying  linear-time  temporal  logic  formulas. 


5 


Minimum  reachability  cost.  This  class  of  properties  concerns  the  mini¬ 
mum  expected  cost  for  reaching  a  subset  of  target  states.  The  cost  can 
represent  various  quantities  of  interest,  such  as  the  amount  of  time  elapsed 
before  the  target  is  reached. 

Maximum  long-run  average  outcome.  This  class  of  properties  is  related 
to  the  long-run  average  outcome  of  system  tasks,  such  as  the  request  for  a 
resource,  or  the  sending  of  a  message.  Long-run  average  properties  enable 
the  specification  of  many  classical  performance  and  reliability  indices  [10]. 

We  show  that  probabilistic  fairness  is  equivalent  to  path  fairness  with  respect 
to  the  maximum  acceptance  probability  and  the  long-run  average  outcome 
classes  of  properties,  and  it  is  equivalent  to  unbounded  fairness  with  respect 
to  the  minimum  reachability  cost  class.  Finally,  for  each  of  these  notions 
of  fairness  and  classes  of  properties  we  present  model-checking  algorithms 
that  can  be  used  to  solve  the  verification  problem  on  finite-state  probabilistic 
systems. 

2  Probabilistic  Systems  and  Fairness 

Our  model  for  probabilistic  systems  is  based  on  Markov  decision  processes 
(MDPs).  An  MDP  is  a  generalization  of  a  Markov  chain  in  which  a  set  of 
possible  actions  is  associated  with  each  state.  To  each  state-action  pair  cor¬ 
responds  a  probability  distribution  on  the  states,  which  is  used  to  select  the 
successor  state  [14].  Markov  decision  processes  are  closely  related  to  the  prob¬ 
abilistic  automata  of  [25],  the  concurrent  Markov  chains  of  [31],  and  the  simple 
probabilistic  automata  of  [29,28]. 

Given  a  countable  set  C  we  denote  by  T>(C)  the  set  of  probability  distribu¬ 
tions  over  C,  i.e.  the  set  of  functions  /  :  C  ^  [0, 1]  such  that  Y2xec  f(x)  =  1- 
An  MDP  V  =  ( S ,  Acts,  A,  p)  consists  of  the  following  components: 

(i)  A  set  S  of  states. 

(ii)  A  set  Acts  of  actions. 

(iii)  A  function  4:Sh  2Acts,  which  associates  with  each  s£Sa  finite  set 
A(s)  C  Acts  of  actions  available  at  s. 

(iv)  A  function  p  :  S  x  Acts  i— >■  'D(S),  which  associates  with  each  s.t  G  S  and 
a  E  A(s)  the  probability  p(s,  a)(t)  of  a  transition  from  s  to  t  when  action 
a  is  selected. 

We  will  often  associate  with  an  MDP  additional  labelings  to  represent  quan¬ 
tities  of  interest;  the  labelings  will  be  simply  added  to  the  list  of  components. 

A  path  of  an  MDP  is  an  infinite  sequence  9  :  s0,  a0,  Si,  ai, . . .  of  alternating 
states  and  actions,  such  that  s,  G  S',  a,  G  A(sj)  and  p(sj,  Oj)(s*+i)  >  0  for  all 
i  >  0.  For  i  >  0,  the  sequence  is  constructed  by  iterating  a  two-phase  selection 
process.  First,  an  action  ai  E  A(si )  is  selected  nondeterministically;  second, 
the  successor  state  Sj+i  is  chosen  according  to  the  probability  distribution 
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p(si,a).  Given  a  path  9  :  s0,  a0,  si,  Gp, . . .  and  k  >  0,  we  denote  by  Xk(9), 
Yk{9 )  its  /c-th  state  sk  and  its  fc-th  action  ak.  respectively. 

For  every  state  s  E  S,  we  denote  by  ©s  the  set  of  (infinite)  paths  having 
s  as  initial  state,  and  we  denote  by  the  set  of  finite  path  prefixes  having 
s  as  initial  state.  The  set  of  all  paths  is  ©  =  Uses®s-  Given  two  paths 
(or  path  prefixes)  9\  and  02,  we  denote  by  9\  <  92  the  fact  that  9\  is  a 
prefix  of  9-2-  Following  the  classical  definition  of  [18],  we  let  Bs  C  20s  be  the 
a-algebra  of  measurable  subsets  of  ©s,  defined  as  the  smallest  algebra  that 
contains  all  the  cylinder  sets  {9  E  0S  |  a  A  9},  for  a  that  ranges  over  £s,  and 
that  is  closed  under  complementation  and  countable  unions  (and  hence  also 
countable  intersections).  The  elements  of  Bs  are  called  events,  and  they  are 
the  measurable  sets  of  paths  to  which  we  will  associate  a  probability. 

2.1  Policies 

To  assign  a  probability  to  the  events  in  Bs,  for  all  s  E  S,  we  need  to  spec¬ 
ify  the  criteria  with  which  the  actions  are  chosen.  To  this  end,  we  use  the 
concept  of  policy  [14],  closely  related  to  the  schedulers  of  [31]  and  to  the  ad¬ 
versaries  of  [29,28].  Denoting  with  S+  the  set  of  non-empty  finite  sequences 
of  states,  a  policy  tt  is  a  mapping  tt  :  S+  ^  V(Acts),  which  associates  with 
each  sequence  of  states  s0,  si,  ■  ■  ■ ,  sn  £  S+  and  each  a  E  A(sn)  the  proba¬ 
bility  7 r(s0,  Si,  •  •  • ,  sn)(a )  of  choosing  a  after  following  the  sequence  of  states 
s0,  si, . . . ,  sn.  We  require  that  7 r(s0,  Si, . . . ,  sn)(a )  >  0  implies  a  E  A(sn ):  a 
policy  can  choose  only  among  the  actions  that  are  available  at  the  state  where 
the  choice  is  made.  We  indicate  with  If  the  set  of  all  policies.  According  to  this 
definition,  policies  are  randomized,  differently  from  the  schedulers  of  [31,23], 
which  are  deterministic.  The  consideration  of  randomized  policies  is  funda¬ 
mental  to  the  further  developments  of  this  paper.  From  these  definitions,  the 
probability  of  following  a  finite  path  prefix  s0,  a0,  Si,  cq, . . . ,  sn  under  policy 
tt  E  II  is  given  by 

n— 1 

7T(So,  •  •  ■  ,Si)(ai)  . 

i= 0 

These  probabilities  for  prefixes  give  rise  to  a  unique  probability  measure  on 
Bs.  For  A  E  Uses^s;  we  write  Pr][(^4)  to  denote  the  probability  of  event 
A  fl  Bs  starting  from  the  initial  state  s  E  S  under  policy  tt.  For  example, 
given  a  set  R  C  S  of  states,  we  denote  by 

(O R)  =  {9  E  ©  |  3k  >  0  .  Xk{9)  E  R} 

the  event  of  reaching  R.  The  probability  of  reaching  R  starting  from  state  s 
under  policy  tt  is  then  Pr^(Oi?).  Similarly,  for  all  s  E  S,  if  /  :  0S  1R  is  a 
measurable  function,  we  denote  by  E^{/}  the  expectation  of  /  from  state  s 
under  policy  tt.  For  example,  given  a  set  R  C  S.  for  all  paths  9  E  0  we  denote 
by 

Tr(9)  =  min {k  1  Xk{9)  E  R} 
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the  first-passage  time  of  9  in  R,  with  the  convention  that  min0  =  oc.  For  all 
s  G  S  the  function  Tr  :  0S  i— >  1R  is  measurable,  and  the  expected  first-passage 
time  in  R  from  s  G  S  under  policy  7 r  is  written  as  E][ {Tr}.  Note  that  we 
omitted  the  argument  9  of  the  random  function  TR(9 ):  for  conciseness,  here 
and  in  the  following  we  omit  the  generic  path  9  that  is  the  argument  of  random 
functions  whenever  we  take  expectations  or  probability  measures. 

2.2  Fairness 

Given  an  MDP  V  =  (. S ,  Acts,  A,p ),  a  fairness  constraint  IF  for  V  is  a  mapping 
IF  :  S  i— »■  2Acts  that  associates  with  each  s  G  S  a  subset  F(s)  C  A(s)  of  fair 
actions  at  s.  The  intended  meaning  is  that  the  choice  at  s  among  actions 
in  F(s)  should  be  “fair.”  The  various  notions  of  fairness  differ  in  the  way  in 
which  this  “fairness”  is  defined.  We  denote  by  SAPairs(V )  =  {(s,  a)  |  s  G 
S  A  a  €  A(s)}  the  set  of  state-action  pairs  of  the  MDP  [14].  Given  a  path  9, 
we  denote  by 


OO 

InfS(9 )  =  {s  G  S  j  3  k  .  Xk(9)  =  s} 

InfSA{9 )  =  {(s,  a)  G  SAPairs(fP )  \  3  k  .  (Xk(9),Yk(9))  =  (s,  a)} 

the  sets  of  states  and  of  state-action  pairs  that  are  repeated  infinitely  often 

OO 

along  9,  where  the  notation  3  k  is  an  abbreviation  for  “there  are  infinitely 
many  distinct  values  for  k ” .  For  each  policy  and  each  initial  state  s  G  S,  the 
functions  InfS  :  0S  i— >■  2s  and  InfSA  :  ©s  h->-  2(SxActs'>  are  measurable. 

Path  fairness 

Path  fairness  essentially  coincides  with  the  fairness  considered  in  [31],  and 
is  called  weak  fairness  in  [17].  We  say  that  a  policy  7 r  is  path-fair  if,  for 
all  initial  states,  the  paths  that  arise  under  7 r  satisfy  with  probability  1  the 
following  condition:  whenever  a  path  visits  infinitely  often  a  state  t,  each 
action  in  X(t)  is  chosen  infinitely  often  at  t.  More  precisely,  7 r  is  path  fair 
with  respect  to  constraint  T  if,  for  all  initial  states  s  E  S  and  all  state-action 
pairs  (t,  a)  G  SAPairs(V)  with  a  G  X{t), 

Pr][  (t  G  InfS  implies  ( t ,  a )  G  InfSA )  =  1  . 

We  call  this  notion  of  fairness  path  fairness  because  the  fairness  of  a  policy  is 
established  on  the  basis  of  the  paths  that  arise  under  the  policy.  In  contrast, 
our  next  notions  of  fairness  refer  directly  to  the  policies. 

Probabilistic  fairness  and  unbounded  fairness 

Probabilistic  fairness  is  a  local  notion  of  fairness  that  refers  directly  to  the 
behavior  of  the  policies  a  the  various  system  states.  Denote  by  S*  the  set  of 
finite  (and  possibly  empty)  sequences  of  states.  A  policy  7 r  is  probabilistically 
fair  with  respect  to  the  constraint  IF  if  there  is  an  e  >  0  such  that  tt(s,  s)(a)  > 
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e  for  all  s  E  S*,  all  s  E  S  and  all  a  E  J-(s).  In  other  words,  a  policy  tt  is 
probabilistically  fair  with  respect  to  T  if  there  is  a  lower  bound  e  >  0  for  the 
probability  of  choosing  a  fair  action,  throughout  the  system’s  behavior  [8,11]. 
This  requirement  can  also  be  written  as: 

inf{-7r(s,  s)(a )  |  sES*AsESAaE  jF(s)}  >  0  . 

In  the  definition  of  probabilistic  fairness,  the  bound  e  can  depend  on  the  policy 
7 r,  but  it  cannot  depend  on  the  past  sequence  s  of  states.  If  s  could  depend  on 
s.  then  probabilistic  fairness  would  reduce  to  a  very  weak  notion  of  fairness, 
which  we  call  unbounded  fairness.  A  policy  tt  is  unboundedly  fair  with  respect 
to  the  constraint  T  if  we  have 

7 r(s,  s)(a)  >  0 

for  all  s  E  A*,  all  s  E  S,  and  all  a  E  tF(s). 

3  Relations  Among  Fairness  Notions 

Given  an  MDP  V  and  a  fairness  constraint  T  for  V ,  we  denote  by  PathF(fP ,  F), 
ProbF(V ,  J-),  and  UnbF(V ,  F)  the  sets  of  policies  that  are  fair  according  to 
path,  probabilistic,  and  unbounded  fairness,  respectively.  We  also  indicate 
with  NoF(V )  =  II  the  set  of  all  policies,  corresponding  to  the  notion  of  no 
fairness.  In  the  following,  we  omit  the  arguments  V  and  F  whenever  they 
can  be  univocally  understood  from  the  context.  The  following  preliminary 
proposition  characterizes  the  hierarchy  between  these  three  fairness  notions. 

Proposition  1  The  following  assertions  hold: 

(i)  For  all  MDPs  V  and  all  fairness  constraints  F ,  we  have 
ProbF{V,F)  C  PathF(V,  F),  and  ProbF(V,F)  C  UnbF{V,T). 

(ii)  Unbounded  fairness  and  path  fairness  are  incomparable: 

(a)  There  is  an  MDP  V  and  a  fairness  constraint  T  such  that 
PathF(V ,  T)  %  UnbF{V ,  T) . 

(b)  There  is  an  MDP  V  and  a  fairness  constraint  T  such  that 
UnbF{V,F)  %  PathF{V,F). 


Proof.  Assertion  (i)  follows  immediately  from  the  definitions  of  fairness. 

The  MDP  V  of  Figure  1  with  its  fairness  constraint  T-p  is  a  witness  for 
assertion  (a).  In  fact,  consider  the  policy  tt  defined  for  all  k  >  0  by  7 r(sk)(a)  = 
1  if  k  is  even,  and  tt (sk)(a)  =  0  if  k  is  odd  (where  sk  is  the  sequence  consisting 
of  k  states  s ).  Then  tt  E  PathF(V,J-p)  and  tt  $  UnbF(V,^Fp). 

The  MDP  Q  of  Figure  1  with  its  fairness  constraint  Tq  is  a  witness  for 
assertion  (b).  In  fact,  consider  the  policy  tt  defined  for  all  k  >  0  by  tt (sk)(a)  = 
2 -1/2  .  From  this  definition  follows  immediately  that  tt  E  UnbF(Q,tFo).  To 
see  that  tt  qL  PathF(Q,  it  suffices  to  note  that  under  policy  tt,  a  path 
that  starts  from  s  is  confined  to  s  (and  takes  only  action  a)  with  probability 
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V  Q 


Fig.  1.  Two  MDPs  V  and  Q.  The  MDPs  are  deterministic,  i.e.  for  each  state  and  action, 
there  is  only  one  successor  state,  indicated  in  the  diagram  by  a  directed  edge  labeled 
with  the  action.  The  MDP  V  =  (S,Acts,A,p)  is  defined  by  S'  =  {s},  Acts  =  {a, b}, 
^4(s)  =  {a,  6},  and  p(s,a)(s)  =  p(s,b)(t)  =  1.  The  MDP  V  has  an  associated  fairness 
constraint  T-p  defined  by  T-p(s)  =  {a,  6}.  The  MDP  Q  is  similarly  defined,  has  it  an 
associated  fairness  constraint  Tg  defined  by  Tg(s)  =  {6},  and  Tg(t)  =  0. 

1/2.  I  I 

3.1  Fairness  and  synchronous  composition 

Path  fairness  does  not  posses  the  same  invariance  properties  of  probabilistic 
and  unbounded  fairness  with  respect  to  synchronous  composition.  In  fact, 
it  is  possible  that  a  policy  that  is  path  fair  for  an  MDP  when  considered  in 
isolation  may  not  be  path  fair  when  the  same  MDP  is  considered  composed 
synchronously  with  a  non-interacting  automaton.  Since  the  MDP  and  the 
automaton  do  not  interact,  this  means  that  the  notion  of  path  fairness  is 
fragile,  and  the  path  fairness  of  a  policy  depends  on  the  “environment”  at  large 
in  which  the  system  is  studied.  This  undesirable  characteristic  is  not  shared 
by  either  probabilistic  or  unbounded  fairness.  The  synchronous  composition 
of  an  MDP  and  an  automaton  is  important  in  verification,  and  the  notion 
of  ct-fairness  has  been  in  part  proposed  to  overcome  this  limitation  of  path 
fairness  [24], 

There  are  many  definitions  for  synchronous  composition,  depending  on  the 
methods  chosen  for  synchronizing  the  systems  being  composed.  To  emphasize 
that  the  phenomenon  is  independent  of  the  particular  definition  adopted,  we 
focus  here  on  what  is  perhaps  the  simplest  form  of  synchronous  composition: 
the  synchronous  product  between  an  MDP  and  a  deterministic  finite-state  au¬ 
tomaton  with  singleton  input  alphabet,  where  the  MDP  and  the  automaton 
are  non-interacting.  Even  though  this  type  of  synchronous  product  is  thor¬ 
oughly  trivial,  it  suffices  to  expose  the  different  behavior  of  the  various  fairness 
notions. 

Given  an  MDP  V  =  (S1  Acts1  A,p)  and  an  automaton  Q  =  (T,  6)  with 
5  :  T  I— »■  T,  we  define  their  synchronous  product  to  be  the  MDP  V\\Q  = 
( S  x  T,  Acts1  B ,  q)1  where: 

•  for  all  s  £  S  and  t  £  T,  we  have  B(s,  t )  =  4l(s). 

•  for  all  s,  s'  £  S1  all  f,  t,'  €  T,  and  all  a  £  A (s),  the  probability  p((s,  t ),  a) (s',  t ') 
is  equal  to  p(s,  a)  (s')  if  t’  =  £(f),  and  is  equal  to  0  otherwise. 
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Corresponding  to  a  fairness  constraint  T-p  for  V ,  we  define  the  fairness  con¬ 
straint  T-p\\ q  for  V\\ Q  by  letting  tF-p\\Q(s,t)  =  J-p(s)  for  all  s  G  S  and  t  G  T. 
Corresponding  to  a  policy  ir-p  for  V,  we  define  the  policy  ttt\\q  for  P||  Q  by 
letting 

7r7,||Q ((so?  t0),  (si,  fi), . . .  ( sn ,  £„))  =  7Pp(s0,  Si,  . . . ,  sn) 

for  all  n  >  0,  all  si,  s2,  •  •  • ,  sn  G  S+,  and  all  ti,  t2,  ■  ■  ■ ,  tn  G  T+ .  With  this 
notation,  we  can  finally  state  the  following  theorem. 

Theorem  1  The  following  assertions  hold: 

(i)  There  is  an  MDPV  with  a  fairness  constraint  Tp ,  there  is  a  deterministic 
automaton  Q  with  singleton  alphabet,  and  there  is  a  policy 

7 T-p  G  PathF(V:  T-p)  such  that  'Kvwq  PathF(V\\ Q,  Fv\\q)- 

(ii)  Consider  a  fairness  notion  $  G  {ProbF,  UnbF}.  For  all  MDPs  V  with 

fairness  constraint  tF-p ,  for  all  deterministic  automata  Q  with  singleton  al¬ 
phabet,  and  for  all  policies  n-p  G  we  haveTi-p\\o  G  §(fP\\ Q,  J~v\\q)- 


Proof.  For  the  first  assertion,  consider  the  MDP  V  and  the  automaton  Q  of 
Figure  2.  The  portion  of  the  synchronous  product  V\\Q  that  is  reachable  from 
the  state  (si,ti)  is  also  depicted  in  the  figure.  Consider  the  policy  7 T-p  defined 
for  all  s  G  S*  by 


if  there  are  an  even  number  of  Si  in  s; 
otherwise. 


It  is  easy  to  check  that  7 T-p  G  PathFifP ,  T-p),  while  7Pp|| q  PathF (V\\Q,  J~p\\ g). 

The  second  assertion  follows  easily  from  the  definition  of  probabilistic  and 
unbounded  fairness.  I  I 


3.2  Fairness  and  probabilistic  properties 

We  analyze  the  relationship  between  the  three  fairness  notions  with  respect  to 
three  classes  of  properties:  acceptance  probability,  reachability  cost,  and  long- 
run  average  outcome.  In  the  following,  we  consider  an  MDP  V  =  ( S ,  Acts,  A,p ) 
together  with  a  fairness  constraint  T  :  S  2Acts ,  unless  otherwise  specified. 

Acceptance  probability 

The  first  class  of  properties  we  consider  concerns  the  maximum  probability 
with  which  a  path  satisfies  a  Rabin  acceptance  constraint.  This  maximum 
probability  is  closely  related  to  the  the  maximum  probability  of  satisfying  a 
linear-time  temporal  logic  formula  [9].  A  Rabin  acceptance  condition  is  a  set 
of  pairs  A  —  {(Q?,  Q[),  •  •  • ,  ( Qvm ,  Qrm )},  where  Q\,  Q\  C  S  for  all  1  <  i  <  m 
[27,30].  A  path  9  of  the  MDP  satisfies  A,  written  9  |=  A,  iff  there  is  1  <  i  <  m 
such  that 


InfS{9)  C  Qpt  , 
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tv 


(*‘2;  G)  (^2,^4) 

Fig.  2.  An  MDP  V,  and  an  automaton  Q.  The  MDP  is  deterministic,  and  has  an 
associated  fairness  constraint  T  defined  by  J-’(si)  =  {«,  6}.  The  automaton  simply 
takes  the  only  possible  transition  at  every  step.  The  portion  of  the  synchronous  product 
V\\  Q  reachable  from  the  state  is  also  depicted. 

InfS{9)  n  ^  0  . 

Given  a  state  s  G  5,  an  acceptance  condition  A,  and  a  notion  $  e  {NoF, 
PathF,  ProbF,  UnbF }  of  fairness,  the  maximum  acceptance  probability  Pr+  (<f>,  A) 
is  defined  as 

(1)  Pb!~ (<h,  ^4)  =  sup  Pr£ [9  \=  A)  . 

7r£$ 

Reachability  cost 

The  second  class  of  properties  we  consider  concerns  the  expected  cost  of  reach¬ 
ing  a  set  of  target  states  in  the  MDP.  To  define  this  quantity,  let  c  :  S  x  Acts  HA¬ 
IR"1"  be  a  cost  function  that  associates  with  each  s  G  S  and  a  G  d(s)  a  cost 
c(s,  a)  >  0.  For  all  initial  states  sfS,  target  subsets  R  C  S,  and  policies  7 r, 
the  expected  cost  of  reaching  R  from  s  under  policy  7 r  is  given  by 

tr-  1 

(2)  <(c,fl)  =  E;{£c(X*,Yt)}, 

A:=0 

where  TR  =  min{/r  |  Xk  6  is  the  first-passage  time  in  R,  with  the  con¬ 
vention  that  min  0  =  00.  For  a  notion  $  €  {NoF,  PathF,  ProbF,  UnbF }  of 
fairness,  the  minimum  expected  reachability  cost  from  s  to  R  is  then  defined 
as 

(3)  v~ (T,  c,  R)  =  inf  vj (c,  R)  . 

7rE<P 

Note  that  R)  is  infinite  if  R  cannot  be  reached  with  probability  1 

from  s.  If  the  cost  c(s,  a)  represents  the  time  (or  the  expected  time)  spent  at 
s  when  action  a  6  A(s)  is  selected,  then  the  quantity  v~(A>,c,R)  is  equal  to 
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the  minimum  expected  time  from  s  to  R.  It  is  possible  to  consider  also  the 
more  general  case  of  non-negative  costs,  as  done  in  [8],  at  the  price  of  some 
mathematical  complications. 


Long-run  average  outcome 

Long-run  average  properties  are  related  to  the  average  behavior  of  the  system, 
measured  over  an  interval  of  time  whose  length  diverges  to  infinity  [8,10]. 
The  specification  of  these  properties  is  based  on  the  notion  of  experiment. 
An  experiment  is  a  finite  portion  of  a  path,  which  corresponds  to  a  task  of 
interest  for  the  performance  or  reliability  analysis  of  the  system.  An  example 
of  experiment  consists  in  a  request  to  access  a  shared  resource,  followed  either 
by  a  grant  or  a  rejection.  With  each  experiment  is  associated  a  numerical 
value  called  the  outcome  of  the  experiment.  The  long-run  average  outcome  of 
the  experiment  is  simply  the  average  value  of  such  outcomes,  measured  over 
a  period  of  time  whose  length  diverges  to  infinity.  In  the  previous  example, 
if  we  associate  outcome  0  with  the  experiments  that  end  with  a  rejection, 
and  outcome  1  with  those  that  end  with  a  grant,  then  the  long-run  average 
outcome  of  the  experiment  is  equal  to  the  long-run  fraction  of  requests  that 
are  granted.  The  long-run  average  outcome  is  defined  on  the  basis  of  two 
functions  R,  W  :  S  x  Acts  i— >-  1R+  that  associate  with  each  s  6  S  and  a  6  A(s) 
the  following  quantities: 

•  the  average  outcome  R(s,  a)  >  0  obtained  when  selecting  action  a  at  s; 

•  a  completion  rate  W(s,a)  >  0,  equal  to  the  probability  of  completing  the 
experiment  when  selecting  action  a  at  s. 

The  restriction  that  W  be  non-zero  is  artificial,  and  in  fact  [8,10]  considers  the 
general  case  of  non-negative  W  (and  arbitrary  R).  We  adopted  this  restric¬ 
tion  because  it  leads  to  a  considerably  simpler  mathematical  treatment,  while 
preserving  the  essence  of  the  argument.  Given  s  G  S,  the  functions  R,  W,  and 
a  policy  7r,  the  expected  long-run  average  outcome  H*(R,  W)  is  defined  as 


(4) 


h:(r,w)  =  eU 


n—  1 


^2R(Xk,Yk) 


lim  sup 


A:=0 


n—  1 


J2w(Xk,Yk) 


k= 0 


For  n  <  oo,  the  numerator  of  (4)  represents  the  total  outcome  obtained  during 
the  first  n  steps  of  the  path,  and  the  denominator  represents  the  number 
of  experiments  performed.  The  limit  for  n  — >  oo  of  this  ratio  corresponds 
therefore  to  the  average  outcome  per  experiment  along  a  path,  and  H^(R,  W) 
is  the  expected  value  of  this  average  outcome,  computed  considering  all  paths 
from  s.  Given  s  G  S,  the  functions  R,  W,  and  a  notion  of  fairness  T  e 
{NoF,  PathF,  ProbF,  UnbF },  we  finally  define  the  maximum  long-run  average 
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outcome  by: 

(5) 


if8($,  R,  W)  =  sup  H*(R,  W)  . 

7re>f 

The  quantity  defined  in  (4)  is  related  to  the  average  reward  of  semi-Markov 
decision  processes  [26,3].  However,  in  the  classical  definition  the  limit  and 
expectation  are  exchanged,  and  the  expectation  is  distributed  in  two  expecta¬ 
tions,  one  above  and  one  below  the  fraction  line.  The  difference  between  the 
two  definitions  is  discussed  in  [8]. 

3.2.1  Preview  of  the  results 

The  behavior  of  the  different  notions  of  fairness  with  respect  to  the  three  above 
classes  of  properties  are  summarized  by  the  following  theorem. 

Theorem  2  For  all  states  s,  and  for  all  A,  resp.  all  c,  R,  resp.  all  R,  W, 
and  for  a  general  finite-state  MDP  with  a  fairness  constraint,  the  following 
relations  hold: 

(i)  Acceptance  probability: 

Pr  +(NoF,A)  =  Pr  +s{UnbF,A) 

>  Pr+  ( PathF ,  A)  =  Pr+  ( ProbF ,  A) 

(ii)  Reachability  cost: 

v~  { NoF ,  c,  R)  =  v~  {PathF,  c,  R) 

<  v~  { UnbF ,  c,  R)  =  v~  {ProbF,  c,  R) 

(iii)  Long-run  average  outcome: 

Hs{NoF,  R,  W)  =  HfiUnbF,  R,  W) 

>  Ft s {PathF,  R,  W)  =  HfiProbF,  R,  W) 

Moreover,  the  inequalities  in  the  above  relations  cannot  in  general  be  replaced 
by  equalities. 

The  above  theorem  tells  us  that  probabilistic  fairness  sides  with  path  fairness 
in  finite-state  systems,  except  for  the  case  of  reachability  cost.  This  theo¬ 
rem  also  supports  our  claim  that  a  probabilistic  treatment  of  fairness  is  not 
any  harder  than  a  traditional  one,  except  for  the  case  of  minimum  expected 
reachability  cost  —  and  even  in  this  case,  we  will  show  that  working  with 
probabilistic  rather  than  path  fairness  entails  only  minor  additional  compli¬ 
cations. 

The  simplicity  of  Theorem  2  is  due  in  part  to  the  fact  that  the  quanti¬ 
ties  in  (1),  (3)  and  (5)  have  been  defined  using  sup  and  inf,  and  we  have 
not  distinguished  between  the  cases  in  which  the  suprema  and  infima  can  be 
achieved  or  not  (i.e.  whether  sup  and  inf  can  be  replaced  with  max  and  min). 
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This  distinction  would  have  blurred  the  insight  provided  by  the  theorem,  and 
would  have  required  the  use  of  more  complex  model-checking  algorithms.  Al¬ 
gorithms  that  distinguish  between  these  two  cases  for  path  fairness  and  Rabin 
acceptance  conditions  have  been  presented  in  [17]. 

In  the  remainder  of  the  paper,  we  provide  model-checking  algorithms  for 
all  the  combinations  of  the  three  notions  of  fairness  and  the  three  classes  of 
properties.  The  equalities  in  Theorem  2  follow  from  the  fact  that  the  notions 
of  fairness  share  the  same  model-checking  algorithms.  The  fact  that  the  in¬ 
equalities  cannot  be  in  general  replaced  by  equalities  is  shown  by  providing 
counterexamples. 

4  Tools  for  Fairness 

In  this  section,  we  present  some  results  on  MDPs  that  will  be  used  in  the 
construction  and  justification  of  the  model-checking  algorithms. 

4-1  End  components 

Given  an  MDP  V  =  (S,  Acts,  A,p),  a  sub-MDP  is  a  pair  (G,  £>),  where  CCS 
is  a  subset  of  states  and  D  :  S  i— »■  2Acts  is  an  action  assignment,  i.e.  a  function 
that  associates  to  each  s  E  S  a  subset  D(s)  C  A(s)  of  actions.  The  sub-MDP 
corresponds  thus  to  a  subset  of  states  and  actions  of  the  original  MDP.  With 
each  sub-MDP  (C,  D )  we  associate  its  set  of  state-action  pairs 

SAPairs(C ,  D )  =  {(s,  a)  E  SAPairs(V )  \  s  E  C  A  a  E  D(s)}  . 

Similarly,  with  each  state-action  set  £  C  SAPairs( V)  we  associate  a  sub-MDP 
(■ C,D )  =  SA Pairs -1(£),  defined  by 

C  =  {s  €  S  |  3a  E  Acts  .  (s,  a)  E  £} 

and,  for  all  s  G  S',  by 

D(s)  =  {a  E  Acts  |  (s,  a)  E  £}  . 

We  say  that  a  sub-MDP  (C,  D )  is  contained  in  a  sub-MDP  (Cr,  D')  if 

SAPairs(C,  D )  C  SAPairs{C' ,D')  . 

We  say  that  a  sub-MDP  (C,  D )  is  an  end  component  (abbreviated  by  EC)  if 
the  following  conditions  hold: 

•  Closure:  for  all  s  E  C,  all  a  E  D(s),  and  all  t  E  S, 
if  p(s,  a)(t)  >  0  then  t  E  C. 

•  Connectivity:  Let  E  =  {(s,t)  E  C  x  C  \  3a  E  D(s )  .  p(s,  a)(t)  >  0}. 

The  graph  (C,  E)  is  strongly  connected. 

Given  a  subset  U  C  S  of  states,  we  say  that  an  EC  (C,  D )  is  maximal  in  U 
if  C  C  U,  and  if  there  is  no  other  EC  (C1,  D')  with  C'  C  U  that  properly 
contains  (C,  D ).  We  denote  by  Mec(U)  the  set  of  maximal  ECs  in  U ;  this  set 
can  be  computed  in  time  polynomial  in  the  size  of  the  MDP  using  simple  graph 
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algorithms.  In  a  purely  probabilistic  system,  fair  end  components  correspond 
to  the  closed  recurrent  classes  of  the  Markov  chain  underlying  the  system  [18]. 
The  significance  of  end  components  in  the  case  of  Markov  decision  processes 
is  stated  by  the  following  theorem. 

Theorem  3  [8]  For  all  s  E  S  and  all  policies  ir,  we  have 

Pr][  (SAP airs-1  (Inf S A)  is  an  EC)  =  1  . 

Given  a  fairness  constraint  F  for  V,  we  say  that  an  end  component  ( C ,  D )  is 
a  fair  end  component  (FEC)  if  the  following  condition  holds,  in  addition  to 
closure  and  connectivity: 

•  Fairness:  For  all  s  E  C,  we  have  F(s)  C  D(s). 

We  define  containment  and  maximality  for  FECs  as  for  ECs,  and  we  denote 
by  MFecifU ,F)  the  set  of  maximal  FECs  contained  in  U  C  S.  Again,  for  each 
U  C  S  set  MFec(U,F)  can  be  computed  in  time  polynomial  in  the  size  of 
the  MDP.  The  following  theorem  indicates  that  fair  end  components  are  the 
corresponding  concept  to  end  components  in  presence  of  fairness. 

Theorem  4  For  all  s  E  S  and  all  7 r  G  ProbF  U  PathF,  we  have 
Pr ns(SAPairs-\lnfSA)  is  a  FEC )  =  1  . 

This  theorem  was  proved  by  [17]  for  path  fairness,  and  by  [8]  for  probabilistic 
fairness.  The  proof  for  probabilistic  fairness  is  in  fact  immediate:  one  needs 
only  examine  the  definition  of  probabilistic  fairness  to  realize  that  Theorem  4 
follows  immediately  from  Theorem  3.  Unbounded  fairness  behaves  differently 
from  path  or  probabilistic  fairness  with  respect  to  end  components,  as  shown 
by  the  following  proposition. 

Proposition  2  For  every  EC  (C,D)  and  0  <  q  <  1,  we  can  construct  a 
policy  7 r(q)  E  UnbF  such  that  for  all  s  E  C , 

Pr (SAP airs-1  (Inf S A)  =  ( C,D ))  >  q  . 

Proof.  Given  q,  we  construct  an  infinite  sequence  {ri(q)}i>o  of  real  numbers 
such  that  0  <  r-fq)  <  1  for  *  >  0,  and  n^or*(<?)  =  <b  by  letting  r-fq)  = 
q{ i/2*+1)_  Then,  policy  7 r(q)  can  be  constructed  as  follows:  at  step  i  of  the 
path,  if  Xi  C,  then  7 r  chooses  uniformly  at  random  an  action  from  A(A’j). 
If  instead  X,t  E  C.  then  7 r  chooses  each  action  in  D(Xf)  with  probability 

r«(g)  1  ~  rj(q) 

\D(Xi)\  \A(Xt)\  ’ 

and  each  action  in  A(s)  \  D(Xf)  with  probability  (1  —  ri(q)) /\A(Xf)\.  It  is 
easy  to  check  that  the  policy  tt  (q)  thus  constructed  has  the  required  property. 
Note  that  policy  7 r(q)  is  history-dependent,  i.e.  its  behavior  at  t  depends  on 
the  prefix  of  path  from  the  starting  state  s  to  t  (in  this  case,  the  dependence 
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is  through  the  length  of  the  path  prefix).  I  I 


4-2  Parametric  Markov  chains 

To  help  with  the  construction  of  sets  of  approximating  fair  policies,  we  present 
some  results  on  parametric  Markov  chains.  In  these  chains  the  coefficients  of 
the  transition  matrix  are  expressed  as  a  function  of  a  parameter.  We  present 
conditions  that  ensure  that  if  the  coefficients  are  continuous  functions  of  the 
parameter,  then  also  the  steady-state  distribution  of  the  chain  depends  con¬ 
tinuously  on  the  parameter. 

Given  a  memoryless  policy  7r,  we  define  a  transition  matrix  P  =  [pS)t]Sjtes 
corresponding  to  7 r  by  taking,  for  all  s,t  G  S, 

Ps,t  =  ff(s)(a)Ms>a)W  • 

a£A(s) 

Recall  that  a  sub-stochastic  matrix  is  a  matrix  P  =  [p.5^]S;tes  such  that 
0  <  ps,t  <  1  for  s,  t  G  S,  and  Y2tesPs,t  <  1  for  all  s  G  S  [18].  The  matrix  corre¬ 
sponding  to  a  memoryless  policy  is  sub-stochastic  (in  fact,  it  is  also  stochastic, 
since  J2teSps,t  =  1  for  all  s  G  S).  Given  a  sub-stochastic  matrix  P,  the  steady- 
state  (or  limiting)  matrix  P*  of  P  is  defined  by  P*  =  lirn^oc  ^  Ylk=o  Pn  •  The 
following  two  propositions  can  be  proved  by  linear  algebra  arguments  [8],  and 
they  provide  sufficient  conditions  under  which  the  steady-state  distribution  of 
a  Markov  chain  is  a  continuous  function  of  a  parameter.  The  first  proposition 
covers  the  case  in  which  the  closed  recurrent  classes  of  the  chain  do  not  depend 
on  the  parameter. 

Proposition  3  For  a  fixed  N,  consider  a  family  P(x )  =  [ps,t(;r)]s,tes'  of 
sub- stochastic  matrices  parameterized  by  a  parameter  x  G  I,  where  I  C  1R 
is  an  interval  of  real  numbers.  Assume  that  the  Markov  chain  having  P  as 
transition  matrix  has  the  same  set  of  closed  recurrent  classes  for  all  x  G  /. 
Then,  if  the  coefficients  of  P(x)  depend  continuously  on  x  for  x  G  I,  also 
the  coefficients  of  the  steady-state  matrix  P*(x)  depend  continuously  on  x  for 
x  G  I. 

A  similar  result  holds  for  chains  in  which  there  is  a  single  closed  recurrent 
class  (which  may  change  as  the  parameter  changes),  and  there  is  a  fixed  state 
that  is  always  in  that  class,  for  all  values  of  the  parameter.  To  state  the 
result,  we  say  that  a  state  is  surely  recurrent  if  the  Markov  chain  has  only  one 
closed  recurrent  class,  and  the  state  belongs  to  that  class.  In  this  case,  the 
steady-state  matrix  P*  can  be  written  as  P*  =  1 tu,  where  T  is  the  transpose 
of  a  vector  consisting  of  \S\  l’s,  and  u  is  the  vector  of  the  steady-state  (or 
limiting)  distribution  of  the  Markov  chain. 

Proposition  4  For  a  fi,xedN,  consider  a  family  P  (x)  =  [ps,t{x)]s,tes  of  sub¬ 
stochastic  matrices  parameterized  by  a  parameter  x  G  I,  where  I  C  1R  is  an 
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interval  of  real  numbers.  Assume  that  there  is  a  state  1  <  k0  <  N  that  is  surely 
recurrent  for  all  x  E  I.  Then,  if  the  coefficients  of  P(x)  depend  continuously 
on  x  for  x  E  I,  also  the  coefficients  of  the  steady-state  distribution  vector  u(x) 
depend  continuously  on  x  for  x  E  I . 


4 . 3  Unconditionally  fair  policy 

In  the  following  arguments,  it  will  be  useful  to  have  a  fixed  policy  that  is 
fair  with  respect  to  all  notions  of  fairness  discussed  in  this  paper.  Hence,  we 
denote  by  7T/  the  memoryless  policy  that  at  each  state  s  E  S  chooses  uniformly 
at  random  an  action  a  E  v4(s). 

5  Acceptance  Probability 

In  this  section  we  prove  Theorem  2,  part  (i),  and  we  provide  algorithms  for 
computing  the  maximum  acceptance  probability  under  the  different  notions 
of  fairness.  The  equalities  in  Theorem  2,  part  (i)  are  proved  by  showing  that 
the  algorithms  for  the  relative  notions  of  fairness  coincide. 

5.1  Probabilistic  fairness 

The  algorithm  for  computing  the  maximum  acceptance  probability  for  prob¬ 
abilistic  fairness  is  taken  from  [8].  By  Theorem  4,  with  probability  1  the  set 
of  states  repeated  infinitely  often  along  a  path  form  a  FEC.  Given  a  Rabin 
acceptance  condition  A  =  {(<3i,  Q{),  ■  ■  ■ ,  (Qm>  Qm)}  and  a  FEC  ( C,D ),  we 
say  that  the  FEC  satisfies  A  (written  ( C ,  D )  |=  A)  iff  there  is  1  <  i  <  m  such 
that  C  C  QP  and  C  fl  Q[  ^  0.  If  (C,  D )  satisfies  A,  and  if  a  path  starting  from 
C  chooses  at  each  s  E  C  an  action  in  D(s )  uniformly  at  random,  the  path  will 
satisfy  A  with  probability  1.  Hence,  let 

Ra  =  |J{C  |  (C,  D )  is  a  FEC  and  (C,  D )  |=  A} 

be  the  union  of  the  sets  of  states  of  all  the  FECs  that  satisfy  A.  The  set  R a 
can  be  computed  more  efficiently  by 

m 

Ra  =  UUFI  (C,  D )  e  MFec(Qpi)  ACnQ^0}. 

i= 1 

Once  Ra  is  reached,  it  is  easy  to  see  that  the  acceptance  condition  can  be 
satisfied  with  probability  1  under  a  probabilistically  fair  policy.  In  fact,  there 
is  a  memoryless  policy  7iy  E  ProbF  such  that  Pr^#  |=  A)  =  1  for  every 
s  E  Ra  (see  [8,  §8]  for  the  details  of  the  construction  of  7rr,  inspired  by  [7]). 
The  surprising  fact  is  that  it  suffices  to  compute  the  maximum  probability 
of  reaching  Ra  under  any  policy,  rather  than  under  any  probabilistically  fair 
policy,  as  stated  by  the  following  proposition  (as  shown  for  path  fairness  in 

[17])- 
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Proposition  5  For  all  s  £  S,  we  have  Pr+  ( ProbF ,  A)  =  max^n  PrJfO-fW. 


In  this  proposition,  ORa  denotes  the  event  of  reaching  Ra,  as  defined  in 
Section  2.1.  We  write  maxjen  Prs(^^)  instead  of  sup„.en  Pr(j (OR a),  even 
though  II  is  an  infinite  set,  to  underline  the  fact  that  there  is  a  policy  7T0  €  II 
such  that 

Pr^O-fiU)  =  sup  Pr£  (0.R4)  • 

7ren 

A  similar  convention  is  used  throughout  the  remainder  of  the  paper.  The 
interest  of  Proposition  5  lies  in  the  fact  that  the  quantity  maxwGn  Pr£  (O Ra) 
can  be  computed  using  a  well-known  reduction  to  linear  programming,  which 
leads  to  a  polynomial-time  algorithm  [6]. 

Proof.  To  prove  Proposition  5,  we  prove  that  for  all  s  £  S  the  following 
equalities  hold: 

(6)  sup  Pr^(0  |=  A)  =  sup  Prg(oAU)  =  maxPr^  (O-hU)  • 

irEProbF  wEProbF  7ren 

To  prove  (6),  we  first  note  that 


max  Pr^  (O-fiU)  >  sup  Pr^(oAU) 

7ren  7r£  ProbF 

>  sup  Pr£  (Q\=A). 

ir  £  ProbF 

The  first  inequality  is  immediate;  the  second  follows  from  the  fact  that  a  path 
from  s  follows  with  probability  1  a  FEC,  so  that  the  probability  of  satisfying  A 
without  entering  Ra  is  0.  In  the  reverse  direction,  a  result  on  Markov  decision 
processes  establishes  the  existence  of  a  memoryless  deterministic  policy  tt d 
such  that,  for  all  s  £  S, 


Pr^(<>  Ra) 


max  Pr^ 

jren  s 


(ORa) 


(see  for  example  [6],  and  for  a  detailed  proof,  [8,  §3]).  Let  also  B  C  S  be  the 
set  of  states  that  cannot  reach  Ra-  From  7 r^,  we  construct  the  policy  7re  that 
coincides  with  Tid  on  S  \  (Ra  U  B).  with  7rr  on  Ra,  and  with  on  B.  Since 
7re  and  tt d  coincide  on  C  =  S  \  (Ra  U  B).  we  have 


(7) 


Pr  le(ORA)  =  PrId(0  Ra)  =  maxP<(0  RA) 

7ren 


for  all  s  G  S.  If  7re  G  ProbF,  then  the  argument  is  easily  concluded.  Otherwise, 
we  construct  a  set  of  probabilistically  fair  policies  that  approximates  7re.  For 
0  <  x  <  1,  define  the  memoryless  policy  tt[x\  by: 

r  iMn  {i Tr(s)(a)  if  s  G  Ra 

(  (1  —  x)  7 Te(s)(a)  +  xi Tf(s){a)  otherwise. 

It  is  easy  to  check  that  ti[x]  G  ProbF  for  0  <  x  <  1.  Since  for  0  <  x  <  1 
policy  Ti  [x]  is  just  one  of  many  probabilistically  fair  ones  that  tries  to  satisfy 
A,  we  have 

(8)  sup  Pr*(0  |=  A)  >  limPr^OiOt)  . 

7r  (zProbF  £—>0 
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To  complete  the  argument,  from  (7)  it  remains  to  show  that 
(9)  lim  Pr (O RA)  >  P<e  (O RA)  • 

To  this  end,  denote  by  P(x)  =  [ps,t(2:)]s,tes'  the  matrix  corresponding  to  7 r[x], 
for  0  <  x  <  1.  Note  that  P( 0)  is  equal  to  the  matrix  Pe  corresponding  to  7re. 
The  closed  recurrent  classes  of  P(x)  are  constant  for  0  <  x  <  1.  In  fact,  for 
0  <  x  <  1  the  closed  recurrent  classes  of  n[x\  are  all  contained  in  B  U  R a, 
and  7r[x]  does  not  depend  on  x  in  B  U  R a-  Denoting  by  P*(x )  = 
the  steady-state  matrix  corresponding  to  P,  we  can  write  the  reachability 
probability  of  Rj i  for  all  s  €  S  as 

PrI[ll(oiU)  =  V  P'jx) , 

tGR^A 

From  lim^o  P(x)  =  P( 0)  =  Pe,  by  Proposition  3,  we  have  lim^o  P*(x)  = 
P5,  from  which  we  obtain  (9),  which  together  with  (8)  and  (7)  concludes  the 
argument.  I  I 

5.2  Path  fairness 

Since  Theorem  4  holds  both  for  probabilistic  and  for  path  fairness,  the  first 
step  in  the  computation  of  Pr+  ( PathF ,  A)  consists  in  computing  the  set  itbt  C 
S,  and  it  coincides  with  the  first  step  of  the  computation  of  Prf  (PathF ,  A). 
In  fact,  we  want  to  prove  that  the  algorithm  for  path  fairness  is  the  same  as 
the  one  for  probabilistic  fairness,  as  stated  by  the  following  proposition. 

Proposition  6  For  all  s  €  S,  we  have 

Pr+  (PathF,  A)  =  max  Pr^ (ORa)  =  Pr^  ( ProbF ,  A)  . 

T.  C  1 1 


Proof.  To  prove  the  proposition,  we  prove  that  the  following  equalities  hold 
for  all  s  G  S: 

(10)  sup  Pr^(0^^)=  sup  Pr(f  (O-fiU)  =  max  Pr(f  (ORa)  ■ 

nEPathF  K^PathF  7reII 

Again,  in  one  direction  the  inequalities  follow  easily: 

max  Pr(f  (OAU)  >  sup  PFs(ORa) 

7ren  7 rePathF 

>  sup  Pr(f  (6  \=  A)  . 

TV  £  PathF 

In  the  other  direction,  note  that  probabilistic  fairness  implies  path  fairness 
(Proposition  1).  Thus,  to  prove  that  for  all  s  €  S 

(11)  sup  Pr(f(0  \=  A)  >  maxPr)f(OiiU) 

tv  £  PathF  ^£11 

it  suffices  note  that  for  0  <  x  <  1,  the  policy  f[x]  used  in  the  proof  of  (8)  and 
(9)  is  also  path  fair.  Hence,  we  can  immediately  duplicate  the  argument  for 
(8)  and  (9)  for  path  fairness,  leading  to  (11)  and  finally  (10).  I  I 
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5.3  Unbounded  fairness 

For  unbounded  fairness,  we  define  the  set  RA  by 

fli  =  UP I  (C.  D )  is  an  EC  and  ( C ,  D )  )=  A} 

m 

=  UUF  I  (C,  D )  G  Mec{($)  A  c  n  Q\A  0}  • 

i=  1 

Differently  from  R A,  the  set  RA  is  computed  disregarding  the  fairness  con¬ 
straints  of  the  MDP.  In  fact,  to  compute  the  maximum  acceptance  probability 
for  unbounded  fairness,  it  turns  out  that  it  is  not  necessary  to  take  fairness 
into  account,  as  the  following  proposition  states. 

Proposition  7  For  all  s  G  S,  we  have 

Pr+  ( UnbF1  A)  =  max  Pr^  (ORA)  =  Pr+  ( NoF ,  A)  . 


Proof.  The  rightmost  equality  simply  encodes  the  algorithm  for  maximum 
acceptance  probability  without  fairness  [9].  Regarding  the  leftmost  equality, 
again  in  one  direction  the  inequalities  follow  easily:  for  all  s  G  S, 

maxPr^(OR^)  >  sup  Pr^(O-R^) 

neUnbF 

>  sup  Pt*(0\=A) 

7tG  UnbF 

=  Prf(UnbF,A)  . 

In  the  other  direction,  in  analogy  with  the  proof  of  Proposition  2,  for  all 
0  <  £  <  1  we  can  construct  a  policy  7r[e]  e  UnbF  such  that  for  all  s  G  S  and 
all  finite  path  prefixes  a  ending  in  RA,  we  have 

Prbf^lQ  \=A\o  <  9)  >1  —  e  . 

Let  also  be  a  policy  such  that 

Pr nsd(OR'A)  =  maxPr^(oR^)  , 

and  let  B  C  S  be  the  set  of  states  that  cannot  reach  RA.  For  all  s  G  S,  all 
a  e  ^4(s),  all  0  <  x  <  1,  all  0  <  £  <  1,  and  all  s  G  S*,  we  define  policy  7r[x,.e] 
by: 

{(1  —  x)  7 Td(s)(a)  +  x  7 Tf(s)(a)  if  s  €  S  \  (B  U  RA ); 

7 Tf(s)(a)  if  s  E  B; 

7r[e](s)(a)  if  s  G  RA. 

For  0  <  x,  e  <  1,  we  have  7r[x,  e]  G  UnbF ;  the  result  then  follows  by  noting 
that  for  all  s  G  S  we  have 

(12)  limPrjM(Ofli)  =  Pr ’.‘(OR’X  =  maxPr  l(OR\) 
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Fig.  3.  Two  MDPs  V  and  Q.  The  MDP  V  is  deterministic,  and  has  an  associated 
fairness  constraint  T-p  defined  by  Tp{s2)  =  {c},  and  Tp{s\)  =  Tp{sz)  =  0.  The 
MDP  Q  has  an  associated  fairness  constraint  Tq  defined  by  Ta{ti)  =  {a,  6},  and 

^Q(t  2)  =  Fa{h)  =  0- 

and  hence 

(13)  Pr  +(UnbF,A)=  sup  P t*(6  \=  A) 

7r£  UnbF 

>  lim  lim  Pr^x,£^  (9  \=  A) 

x—>0  s — rO 

=  lim  Pr^Xjd(oi?*i) 

=  Pr^fOiO)  , 

as  was  to  be  shown.  The  proof  of  (12)  and  (13)  follows  the  lines  of  the  proofs 
of  Propositions  5  and  6.  II 

Finally,  the  result  of  Theorem  2,  part  (i)  follows  by  noting  that  C  R*A. 
and  by  comparing  Propositions  5,  6,  and  7. 

5-4  A  counterexample  to  equality 

To  see  that  the  inequality  in  Theorem  2,  part  (i)  cannot  in  general  be  re¬ 
placed  by  equality,  consider  the  MDP  V  of  Figure  3,  together  with  the  ac¬ 
ceptance  condition  A  =  {({S2},  {s2})}-  We  have  Pr+  (ProbF.  A)  =  0  and 
Pr+  (PathF,  A)  =  1. 


6  Reachability  Cost 

In  this  section,  we  study  the  algorithms  for  computing  the  minimum  reacha¬ 
bility  cost  under  the  various  notions  of  fairness,  and  in  the  process  we  prove 
Theorem  2,  part  (ii). 

Given  a  state  s  G  S  and  an  action  a  G  7l(s)  for  s,  we  denote  by 
dest(s,  a)  =  {t  G  S  \  p(s,  a)(t )  >  0} 

the  set  of  possible  successors  of  s  when  a  is  selected. 

Since  the  costs  are  strictly  positive,  the  cost  from  a  state  s  G  S  to  the  target 
set  R  C  S  can  be  finite  only  if  R  can  be  reached  from  s  with  probability  1. 
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Hence,  before  presenting  the  algorithms  for  the  various  notions  of  fairness,  we 
present  an  algorithm  that  computes  the  set  of  states  from  which  R  can  be 
reached  with  probability  1,  under  a  generic  fairness  constraint  Q  :  S  t— »■  2Acts 
(not  necessarily  coinciding  with  the  constraint  T  of  the  MDP).  The  algorithm 
is  essentially  the  algorithm  of  [8],  presented  in  an  improved  notation.  To 
present  the  algorithm,  we  define  the  predicate  FAprefY.  X.  Q)  over  S,  where 
X,  Y  C  S  and  Q  :  S  ^  2Acts ,  by  s  |=  FApre(Y,  X,  Q)  iff: 

Va  G  Q(s)  .  dest(s ,  a)  C  Y 

A  3a  G  H(s)  .  ( dest(s ,  a)  C  Y  A  dest(s,  a)  fl  X  ^  0)  . 

For  R  C  S  and  Q  :  S  2j4c<s,  we  then  define  Reach {R,  Q)  by  the  //-calculus 

formula: 

(14)  Reach (R,  Q)  =  vY  .  /aX  .  ( FAprefY, ,  X,  £)  V  fl)  , 

where  we  have  used  the  slightly  improper  notation  of  using  R  as  a  predicate 
that  holds  exactly  for  the  states  in  R.  The  following  proposition  can  be  proved 
by  induction  on  the  iterations  used  to  compute  the  //-calculus  formula. 

Proposition  8  Given  an  absorbing  target  set  R  C  S  and  Q  :  S  2Acts,  let 
U  be  the  largest  subset  of  states  of  S  that  satisfies  the  following  two  properties: 

•  For  all  s  G  U  \  R  and  all  a  G  Q{U),  we  have  dest(s,  a)  C  U . 

•  For  all  s  G  U ,  there  is  a  path  from  s  to  R  in  the  graph  (U.  E),  where 

£  =  {(s,f)  G  17  x  U  |  3a  G  H(s)  .  [dest(s,  a)  C  U  A  t  G  dest(s,  a)] }  . 
Then,  U  =  Reach(R,Q). 


6. 1  Probabilistic  fairness 

The  following  proposition  establishes  that  Reach(R,tF)  is  the  set  of  states 
from  which  the  minimum  cost  to  R  converges. 

Proposition  9  [8]  We  have  v~  (ProbF,  c,  R)  <  oo  iff  s  G  Reach(R,iF). 

Proof.  In  one  direction,  Proposition  9  follows  easily  from  Proposition  8. 
In  fact,  consider  the  policy  that  at  each  t  G  U  chooses  the  action  from 
{a  G  Aft)  |  dest(t,a )  C  U}  uniformly  at  random.  Under  this  policy,  R  is 
reached  with  probability  1  and  within  finite  expected  time  from  all  s  G  U, 
ensuring  the  convergence  of  the  minimum  cost.  In  the  other  direction,  an  in¬ 
ductive  argument  that  follows  the  structure  of  (14)  shows  that  if  s  £  U,  then 
Pr^(O-R)  <  1  for  all  7 r  G  ProbF  (see  [13]  for  related  arguments),  which  leads 
to  the  result.  I  I 

For  all  s  G  U,  it  is  possible  to  compute  the  minimum  cost  to  R  under 
no  fairness  assumptions  vf(NoF,c,R )  by  solving  a  stochastic  shortest  path 
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problem  [4],  The  following  result  states  that  this  cost  is  equal  to  the  cost 
v~  (ProbF ,  c,  R)  under  probabilistic  fairness.  Together  with  Proposition  9, 
this  yields  an  algorithm  for  the  computation  of  v~  (ProbF,  c,  R )  for  all  s  E  S. 

Proposition  10  For  all  s  E  Reach(R,  F)  we  have 

v~  (ProbF,  c,  R)  =  v~(NoF,  c,  R)  . 

Proof.  To  prove  this  result,  we  again  use  the  idea  of  approximating  the  (pos¬ 
sibly  unfair)  policy  corresponding  to  the  stochastic  shortest  path  problem  with 
a  set  of  probabilistically  fair  policies.  To  this  end,  let  U  =  Reach(R,F ),  and 
let  7 Td  be  a  memoryless  policy  such  that  for  all  s  £  U  we  have  v*d  (c,  R)  = 
v~  (NoF ,  c,  R)  (for  the  existence  of  such  a  policy,  see  [4]).  Let  also  ttu  be 
any  memoryless  policy  that  at  all  s  E  U  chooses  an  action  from  {a  E  A(s)  \ 
dest(s,a )  C  U}  uniformly  at  random.  For  0  <  x  <  1,  we  define  the  memory¬ 
less  policy  ti[x\  by,  for  all  s  E  S  and  a  E  A (s), 

(15)  7r[x](s)(o)  =  (1  -x)t rd(s)(a)  +X7 ru(s)(a.)  . 

Note  that  for  0  <  x  <  1  we  have  ti[x\  E  ProbF.  We  want  to  show  that  for  all 
s  E  U,  we  have 

(16)  lim  vAx\c,  R)  =  njd(c,  R)  . 

x—*0 

From  this  equation,  Proposition  10  follows  easily.  To  show  (16),  first  observe 
that  it  suffices  to  focus  on  the  set  V  =  U  \  R,  since  neither  7Td  nor  ttu  lead 
from  U  to  outside  U,  and  since  the  reachability  cost  from  R  is  0.  Denote 
by  P(x)  =  [ ps,t(x)]s,tev  the  probability  transition  matrix  corresponding  to 
the  policy  ti[x\  restricted  to  set  V ,  and  note  that  -P(O)  is  the  probability 
transition  matrix  corresponding  to  nd.  For  0  <  x  <  1  define  also  the  vector 
z(x)  =  [2:s(a;)]sey  by 

zs(x )  =  E  7r[x](s)(a)  c(s,a)  . 

aeA(s) 

With  this  notation,  from  (2)  for  s  E  V  and  0  <  x  <  1  we  have 

OO 

vfx] (c,R)  =  J^Pk(x)z(x)  =  (I  —  P(x))~1z(x)  . 

A:=0 

Since  for  0  <  x  <  1  the  matrix  P(x )  corresponds  to  a  transient  Markov  chain, 
we  have  det (I  —  P(x))  ^  0  in  this  interval.  Thus,  for  0  <  x  <  1  the  coefficients 
of  (I  —  P(x))~1  are  rational  functions  of  x  that  have  no  poles  in  the  interval 
[0, 1].  Since  also  z(x)  is  continuous  in  [0, 1],  we  finally  have 

lim  vAx\c,  R)  =  lim  (I  —  P(x))~lz(x) 

x— >0  x— >0 

=  (/-P(0))-‘z(0) 

=  <-(c,R) 


thus  proving  (16).  I  I 
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6.2  Unbounded  fairness 

The  equivalent  of  Proposition  9  can  be  proved  also  for  unbounded  fairness. 

Proposition  11  [8]  We  have  vf  (UnbF,c,R )  <  oo  iff  s  £  Reach(R,IF). 

The  rest  of  the  analysis  for  the  proof  of  Proposition  10  can  then  be  carried 
through  unchanged,  observing  that  for  all  0  <  x  <  1  the  policy  n[x\  defined 
by  (15)  is  such  that  7r[x]  G  UnbF.  Hence,  we  obtain  the  following  result. 

Proposition  12  For  all  s  G  S ,  we  have  vf(UnbF,  c,  R)  =  vf(ProbF,  c,  R). 


6.3  Path  fairness 

With  respect  to  reachability  cost,  path  fairness  behaves  differently  from  the 
other  two  notions  of  fairness.  The  following  proposition  states  that 

v~  ( PathF ,  c,  R)  =  v~  ( NoF ,  c,  R) 

for  all  s  G  S. 

Proposition  13  Denote  by  (A s.0)  :  S  h->-  2Acts  the  empty  fairness  constraint. 
For  all  s  G  S,  the  following  assertions  hold: 

(i)  If  s  $  Reach(R ,  A  s.0),  then  v~  {PathF,  c,  R)  =  vf  ( NoF.  c,  R)  =  oo. 

(ii)  If  s  G  Reach(R,  A  s.0),  then  vf  (PathF,  c,  R )  =  (NoF,  c ,  i?). 

Proof.  Let  U*  =  Reach(R,  A  s.0).  The  first  assertion  is  shown  by  proving  that 
if  s  ^  [/*  then  Pr^(Of?)  <  1  for  all  policies  7r,  so  that  vf(c,R)  =  oo  for  all 
policies  7T.  This  result  is  proved  using  an  inductive  argument  on  the  iterations 
of  (14). 

For  s  G  f/',  the  second  assertion  can  be  proved  as  follows.  Let  tt^  be  the 
memoryless  policy  such  that  vfd(c,R)  =  v~  (NoF ,  c,  R).  Define  7re  to  be  the 
(history-dependent)  policy  that  coincides  with  7rd  until  R  is  reached,  and  that 
chooses  actions  uniformly  at  random  after  R  is  reached.  We  have  7re  G  PathF: 
in  fact,  under  policy  7rd  any  path  that  reaches  R  is  fair,  and  the  set  of  paths 
that  never  reach  R  has  probability  measure  0.  It  is  then  immediate  to  check 
that  vfd(c,  R)  =  vfe(c,  R),  leading  to  the  result.  I  I 

6.4  Fairness  and  reachability 

Together,  Propositions  9,  10,  12,  and  13  prove  Theorem  2,  part  (ii).  Intuitively, 
Theorem  2,  part  (ii)  can  be  interpreted  as  follows.  Let 

U  =  Reach(R ,  IF) 

Um  =  Reach(R ,  As.0)  . 
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If  U  =  U* ,  then  under  all  three  notions  of  fairness  we  can  achieve  a  cost  to  R 
that  is  arbitrarily  close  to  that  achieved  by  the  optimal  (not  necessarily  fair) 
policy.  If  U  C  U*,  on  the  other  hand,  the  inequality  in  Theorem  2,  part  (ii) 
is  strict  for  some  s  G  U*  \  U.  In  this  latter  case,  the  difference  between 
the  behavior  of  probabilistic  and  unbounded  fairness  on  one  side,  and  path 
fairness  on  the  other,  is  essentially  due  to  the  following  phenomenon.  Suppose 
that  from  a  state  s.  in  order  to  reach  R,  a  path  must  visit  a  state  t,  with 
Aft)  =  {a,  b}.  From  t.  action  a  leads  to  R,  and  action  b  leads  to  a  set  of  states 
that  cannot  reach  R.  Probabilistic  and  unbounded  fairness  require  that  a 
policy  be  fair  at  all  steps.  Hence,  under  a  probabilistically  or  unboundedly  fair 
policy,  action  b  must  be  selected  with  non-zero  probability,  and  the  expected 
cost  to  R  will  be  infinite.  On  the  other  hand,  path  fairness  does  not  impose 
requirements  on  all  steps  of  the  paths.  As  long  as  a  policy  visits  t  only  finitely 
often  (which  is  the  case  here),  the  policy  can  deterministically  select  a  at  t , 
and  the  expected  cost  to  R  will  converge. 

6.5  A  counterexample  to  equality 

To  see  that  the  inequality  in  Theorem  2,  part  (ii)  cannot  in  general  be  replaced 
by  equality,  consider  the  MDP  Q  of  Figure  3.  Let  c  be  the  cost  function  that 
associates  1  with  all  state-action  pairs  of  the  MDP,  and  let  R  =  {t2}-  We 
have  v^(NoF,  c,  R)  =  1  and  vf1(ProbF,  c,  R)  =  oo. 

7  Long-Run  Average  Outcome 

Before  dealing  with  the  case  of  general  MDPs,  we  prove  that  the  three  notions 
of  fairness  lead  to  the  same  maximum  long-run  average  outcome,  provided 
the  MDP  is  strongly  connected.  We  say  that  the  MDP  V  =  (S,  Acts,  A,  p) 
is  strongly  connected  if  the  graph  (. S ,  E )  is  strongly  connected,  where  E  = 
{(s,t)  €  S  x  S  |  3a  €  A(s)  .  p(s,a)(t )  >  0}.  The  following  proposition 
summarizes  several  results  for  strongly  connected  MDPs. 

Proposition  14  [8,  §5]  Consider  a  strongly  connected  MDP  V  with  state 
space  S,  together  with  outcome  and  cost  functions  R,  W .  The  following  as¬ 
sertions  hold. 

•  The  value  of  Hs(NoF,  R,  W)  does  not  depend  on  s  €  S.  The  common 
value  H(NoF,  R,  W)  can  be  computed  in  time  polynomial  in  the  size  of  V 
by  solving  a  linear  programming  problem. 

•  There  is  a  memoryless  policy  tt d  such  that  Hs(NoF,  R,W)  =  Hfd(R,W ) 
for  all  s  €  S.  Moreover,  the  transition  probability  matrix  P^  induced  by  7i,i 
corresponds  to  a  Markov  chain  having  a  single  closed  recurrent  class. 

Using  this  proposition,  we  can  show  that  the  maximum  long-run  average  out¬ 
come  coincide  for  our  three  notions  of  fairness  on  strongly  connected  MDPs. 
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Proposition  15  On  a  strongly  connected  MDP,  for  all  s  G  S  and 
$  G  {ProbF,  PathF ,  UnbF],  we  have 

R,  W)  =  H(NoF,  R,  W)  . 


Proof.  The  proof  of  this  proposition  is  once  more  based  on  approximating 
the  optimal  policy  in  the  absence  of  fairness  with  a  set  of  fair  policies.  Let  tt 
be  as  in  Proposition  14.  For  0  <  x  <  1,  all  s  G  S  and  all  a  G  H(s),  we  define 
the  memoryless  policy  t:[x\  by 

7r[a;](s)(a)  =  (1  —  x)  7 Td{s)(a)  +  X7if(s)(a)  . 

For  0  <  x  <  1,  we  have  7r[x]  G  ProbF.  For  0  <  x  <  1,  denote  by  P(x) 
the  transition  probability  matrix  arising  from  7r[x],  and  define  the  vectors 
r(x )  =  [rs(x)]seS  and  w  =  [ws(a;)]seS  by 

rs{x)  =  ^2  JR(s,a)7r[x](s)(a) 
aen(s) 

Ws{x)  =  ^2  w(s,a)ir[x](s)(a)  . 

a£A(s) 


Denote  by  P*(x )  =  [p*s  the  steady-state  probability  distribution  ma¬ 

trix  corresponding  to  P(x).  By  our  choice  of  7^,  the  Markov  chain  corre¬ 
sponding  to  -P(O)  has  a  single  closed  recurrent  class  CCS.  Since  the  MDP 
is  strongly  connected,  by  definition  of  ti[x\  all  states  of  C  are  surely  recurrent 
for  0  <  x  <  1.  Hence,  as  a  consequence  of  standard  facts  on  Markov  chains 
we  have 


Hfx\R,W) 


J2tesP*sAx)  n(x) 
J2tesP*sAx)  Mx) 


Moreover,  Proposition  4  ensures  that  lim^o  P*{x)  =  -P(O).  Since  for  all  t  G  S 
quantities  rt{x)  and  wt(x)  are  continuous  for  x  — »  0,  we  have 

lim  Hfx\R,  W )  =  H(NoF,  R,  W)  . 


Hence,  for  all  s  G  S  we  have  Hs(ProbF ,  R,W)  >  H(NoF,  R,W).  Since  the 
reverse  inequality  is  immediate,  we  conclude 


Hs(ProbF,  R,  W)  =  H(NoF,  R,  W ) 


as  was  to  be  shown.  The  equivalent  results  for  PathF  and  UnbF  follow  then 
immediately  by  observing  that  ProbF  C  PathF  and  ProbF  C  UnbF.  I  I 


7. 1  Probabilistic  fairness 

If  the  MDP  V  is  strongly  connected,  Proposition  14  and  15  provide  a  method 
for  the  computation  of  Hs(ProbF ,  R,W)  at  all  s  G  S.  In  the  general  case, 
from  (4)  we  see  that  the  expected  long-run  average  outcome  depends  only  on 
the  states  and  actions  that  are  repeated  infinitely  often.  Theorem  4  states 
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that  these  states  and  actions  form  a  FEC  with  probability  1:  hence,  we  can 
concentrate  our  attention  on  the  maximal  FECs.  Let 


MFec(S,  f)  =  £  =  {(C1,D1),...,  (Cn,  Dn )}  , 


and  note  that  for  1  <  i  <  n,  the  FEC  (Ci,  Di)  is  a  strongly  connected  sub- 
MDPs  of  the  global  MDP.  Hence,  for  1  <  i  <  n  we  can  associate  with  (C),  Di) 
the  maximum  long-run  average  outcome  Hl(NoF,  R,  W)  that  can  be  obtained 
when  staying  forever  in  (C*,  Di),  computed  using  Proposition  14  and  15. 

Once  the  maximum  long-run  average  outcomes  for  the  maximal  FECs  have 
been  computed,  we  can  compute  Hs(ProbF,  R,  W )  at  all  s  E  S  using  an  idea 
that  originates  from  [6].  For  all  1  <  i  <  n,  we  add  to  the  MDP  a  special  state 
which  signals  the  intention  to  stay  in  ( Ci,Di )  forever.  For  1  <  i  <  n,  we 
let  A(tf)  =  {bi},  where  bi  is  an  action  that  leads  back  L,  i.e.  dest(ti,  bi)  =  {L}. 
The  set  of  states  {t\, . . .  ,tn}  is  thus  absorbing.  For  all  1  <  i  <  n  and  all 
s  E  Ci,  we  also  add  to  T(s)  a  new  action  a*  that  leads  deterministically  to  tf 
the  choice  of  a*  represents  the  decision  of  staying  in  (Ci,  Di)  from  that  point 
on.  Finally,  we  associate  with  each  state  s  G  S  U  {H, . . . ,  tn }  and  a  E  A(s)  of 
the  new  MDP  a  final  reward  h(s)  defined  by 


h(s,  a) 


{ 


Hl(NoF,  R,  W )  if  s  E  Ci  and  a  =  a,i,  for  1  <  i  <  n; 

0  otherwise. 

For  1  <  i  <  n,  the  reward  associated  with  a  transition  from  C*  to  £  is 
thus  equal  to  the  maximum  long-run  average  reward  that  can  be  obtained  by 
staying  in  (Ci,  Di)  forever;  the  reward  h  is  0  on  all  other  transitions. 

Denote  by  V[(Ci,Di),...,(Cn,Dn)\  the  MDP  obtained  from  V  in  this 
fashion.  The  following  proposition  states  that  the  maximum  long-run  average 
outcome  Hs(ProbF,  R,W)  at  all  s  E  S  can  be  computed  by  solving  a  maxi¬ 
mum  expected  total  reward  problem  on  V[(C\,  Di), . . . ,  ( Cn ,  Dn)\,  using  h  as 
the  reward. 


Proposition  16  Let  the  MDP  V[(Ci,  D{), . . . ,  ( Cn ,  Dn )]  and  the  reward  h 
be  as  described  above.  Then,  for  all  s  E  S  we  have: 


where  the  max  in  (17)  exists. 

The  maximum  expected  total  cost  mentioned  in  the  proposition  can  be  solved 
in  several  ways:  see  for  example  [3]  or,  for  more  efficient  algorithms  tailored 
to  this  type  of  problem,  [8,  §7]  [12], 

Proof.  On  the  one  hand,  consider  a  memoryless  policy  7re  for  the  MDP 
V[(Ci,  Di), . . . ,  (Cn,  Dn )]  that  realizes  the  maximum  in  the  total  cost  problem 
(17), 

For  1  <  i  <  n,  we  can  assume  that  if  7re  chooses  with  positive  probability 
action  a*  at  some  state  s  E  Ci,  then  it  chooses  a*  deterministically  at  all  states 


{OO 

V  h(Xk,  Yk) 

k= 0 
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of  C'j.  In  fact,  assume  towards  the  contradiction  that  at  t  G  Cj  there  is  a 
strictly  better  choice  from  the  point  of  view  of  total  cost.  Since  (Cj,  Df)  is 
strongly  connected,  then  a  strictly  better  policy  would  be  obtained  by  choosing- 
all  actions  in  D \  uniformly  at  random  at  all  states  of  Cj\{f},  until  t  is  reached, 
and  choosing  the  better  choice  at  t,  contradicting  the  hypothesis  that  7re  is 
optimal. 

For  0  <  x  <  1,  from  7Te  we  construct  a  memoryless  policy  tt[x\  for  V  as 
follows.  Policy  tt[x\  coincides  with  tt€  on  all  S  \  UjLi  Q-  For  1  <  i  <  n,  if  ne 
does  not  choose  a*  at  Cj,  then  tt[x\  coincides  with  ne  also  on  Cj.  If  7re  chooses 
cq  in  C'j,  for  1  <  i  <  n,  then  we  take  tt[x\  to  coincide  with  the  probabilistically 
fair  ^-optimal  policy  for  (Cj,  Df),  constructed  as  in  the  proof  of  Proposition  15. 

On  the  basis  of  tt[x\,  for  0  <  £  <  1  and  0  <  x  <  1  we  construct  a 
memoryless  policy  tt[£,  x]  by,  for  all  s  G  £  and  a  e  d(s), 

Tr[x\(s)(a)  ifseUtiQ 

(1  —  e)t re(s)(a)  +  e  tt  f(s)(a)  otherwise. 

Using  arguments  similar  to  those  for  Propositions  10  and  15,  it  is  not  difficult 
to  prove  that  for  all  s  G  S',  we  have 


7r[e,  x](s)(a) 


i  * 

I 


lim  lim  W)  =  E”/  l  V  h{ Xk,  Yk) 

e->0a:-^0  s  s  1  ^ 


.  k= 0 


which  leads  to  the  result. 

In  the  other  direction,  consider  an  arbitrary  probabilistically  fair  policy  tt. 
Under  policy  tt,  the  paths  are  with  probability  1  eventually  confined  to  some 
(Cj,  Dj)  with  1  <  i  <  n.  Once  confined  in  (Cj,  Dj),  it  is  possible  to  prove  that 
tt  cannot  do  better  than  Hl(NoF ,  R,W)  (see  [8]  for  a  detailed  argument). 
Hence,  for  all  s  G  S  we  have 

n 

Hg(R,  W)  <  H\NoF ,  R,  W )  P<  (infSA  =  SAPairs(Ci,  A))  , 

4=1 

and  from  this  follows  easily  the  result.  I  I 


7. 2  Path  and  unbounded  fairness 

Similarly  to  probabilistic  fairness,  also  under  path  fairness  the  set  of  state- 
action  pairs  that  are  repeated  infinitely  often  along  a  path  forms  a  FEC  with 
probability  1.  Hence,  we  can  repeat  for  path  fairness  the  same  reasoning  done 
in  the  previous  subsection  for  probabilistic  fairness.  From  the  equality  of  the 
algorithms  for  the  computation  of  the  maximum  long-run  average  outcome  for 
these  two  notions  of  fairness,  we  obtain  that  for  all  s  G  S, 

(18)  Hs{ProbF,  R,  W)  =  Hs{PathF ,  R,  W )  , 

which  is  one  part  of  Theorem  2,  part  (iii). 

For  unbounded  fairness,  Proposition  2  tells  us  that  a  path  that  enters  an 
EC  can  stay  forever  in  the  EC  with  probability  arbitrarily  close  to  1,  even  if 
the  EC  is  not  fair.  This  suggests  that  for  dealing  with  unbounded  fairness,  the 
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only  modification  needed  to  the  algorithm  of  the  previous  section  is  to  take 
C  =  Mec(S)  instead  of  C  =  MFec(S),  thus  considering  all  ECs,  including  the 
unfair  ones.  This  intuition  is  confirmed  by  the  following  proposition. 

Proposition  17  We  have  Hs(UnbF,  R,  W)  =  Hs(NoF,  R,  W)  for  all  s  G  S. 

Proof.  The  inequality  F[s(UnbF1  f?,  W )  >  Fls(NoF1  .R,  W)  holds  trivially  for 
all  s  G  S.  To  show  the  converse  inequality,  the  key  step  is  to  show  that,  given 
an  EC  (C,  D ),  we  have  a  set  of  policies  n[x\  such  that,  for  all  t  G  C, 

(19)  lim  H?[x](R,  W)  =  HANoF ,  R,  W)[(C,  D) } 

x—^0 

(20)  lim  Pr^N  Nk  >  0  .  (Xk  G  C  A  Yk  G  D{Xk)))  =  1  , 

where  Ht(NoF ,  R,W)[(C,  D)\  refers  to  the  maximum  long-run  average  out¬ 
come  that  can  be  obtained  on  the  EC  (C,D),  rather  than  on  the  whole 
MDP.  To  this  end,  let  7 Td  be  a  memoryless  policy  such  that  Hfd(R,W )  = 
Ht{NoF1  i?,  W)[(C,  D)]  for  all  t  G  C.  By  definition,  we  have  that 

Pr^d  (Vfc  >  0  .  (Xk  GCA  Yk  G  D(Xk))  =  1  . 

For  0  <  x  <  1,  construct  the  policy  7r[x]  by 

7r[x](si, . . . ,  sk)  =  (1  -  x)1/2k7Td{sk)  +  (l  -  (1  -  x)1/2k)nf(sk) 

for  all  k  x*  1  and  all  s 25  •  •  •  5  ^  ^5  and  by  7t[x](^5^5  . . .  5  5^)  —  7Tj  for  fa  ^  1 

and  Sk  ^  C.  A  straightforward  calculation  shows  that 

Pr^  (V/c  >  0  .  (X,  G  C  A  Yk  G  £>(**)))  =  1  -  x  , 

which  shows  (20).  In  addition,  notice  that  policy  ti[x\  is  a  linear  combination 
of  TTd  and  7 r/  that  is  always  at  least  as  close  to  7Td  as  (1  —  x)7id  +  xiif.  Hence, 
(19)  follows  from  the  same  arguments  used  to  prove  Proposition  15. 

Once  (19)  and  (20)  have  been  proved,  the  results  follows  from  considering 
the  MDP  V[(Ci,  Tb), . . . ,  ( Cn ,  Dn)\  obtained  as  for  Proposition  16,  except  that 
(Ci,  -Dx), . . . ,  (Cn,  Dn )  are  the  ECs  (instead  of  the  FECs)  of  the  original  MDP, 
and  that  the  final  rewards  are  defined  by  h(s,  a)  =  Fts(NoF,  R,  VF)[(C,  D )]  for 
all  ECs  (C,  D )  and  all  s  G  C,  and  h(s,  a)  =  0  otherwise.  The  result  can  be 
obtained  by  reasoning  as  in  the  proof  of  Proposition  16.  I  I 


7.3  A  counterexample  to  equality 

To  see  that  the  inequality  in  Theorem  2,  part  (iii)  cannot  in  general  be  replaced 
by  equality,  consider  the  MDP  V  of  Figure  3.  We  consider  two  functions  R 
and  W,  such  that  W  is  equal  to  1  for  all  state-action  pairs,  and  R  is  defined 
by  R(si,a)  =  f?(s2,c)  =  R(ss,d)  =  0  and  f?(s2,6)  =  1.  Then,  it  is  easy  to 
check  that  HSl  ( NoF1  i?,  W )  =  1  and  HSl  ( ProbF1  R1  W )  =  0. 
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